OPTIMAL ESTIMATION OF FREE ENERGIES AND STATIONARY 
DENSITIES FROM MULTIPLE BIASED SIMULATIONS 

HAO WUtAND FRANK NOfit 

Abstract. When studying high-dimensional dynamical systems such as macromolecules, quan- 
tum systems and polymers, a prime concern is the identification of the most probable states and their 
stationary probabilities or free energies. Often, these systems have metastable regions or phases, 
prohibiting to estimate the stationary probabilities by direct simulation. Efficient sampling methods 
such as umbrella sampling, metadynamics and conformational flooding have developed that perform 
a number of simulations where the system's potential is biased such as to accelerate the rare bar- 
rier crossing events. A joint free energy proflle or stationary density can then be obtained from 
these biased simulations with weighted histogram analysis method (WHAM). This approach (a) 
requires a few essential order parameters to be defined in which the histogram is set up, and (b) 
assumes that each simulation is in global equilibrium. Both assumptions make the investigation of 
high-dimensional systems with previously unknown energy landscape difficult. Here, we introduce 
the transition matrix unweighting (TMU) method, a simple and efficient estimation method which 
dismisses both assumptions. The configuration space is discretized into sets, but these sets are not 
only restricted to the selected slow coordinate but can be clusters that form a partition of high- 
dimensional state space. The assumption of global equilibrium is replaced by requiring only local 
equilibrium within the discrete sets, and the stationary density or free energy is extracted from the 
transitions between clusters. We prove the asymptotic convergence and normality of TMU, give an 
efficient approximate version of it and demonstrate its usefulness on numerical examples. 

Key words. Markov chain, maximum likelihood estimation, error analysis, free energy, station- 
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1. Introduction. Stochastic simulations of chemical, physical or biological pro- 
cesses often involve rare events that render the exploration of relevant states, or the 
calculation of expectation values by direct numerical simulation difficult or impossi- 
ble. Examples include phase transitions in spin systems |22l 15]. transitions between 
different chemical states in quantum dynamics simulations [13' or conformational tran- 
sitions in biomolecules tSj. For this reason, many enhanced sampling techniques have 
been developed to modify the dynamics of the original simulation system such that 
the relevant rare events become more frequent and can be accessed by direct numeri- 
cal simulation of the modified simulation system. Such an approach is of course only 
reasonable if there exists a way to reliably compute at least some quantities of inter- 
est of the original simulation system from the realizations of the modified simulation 
system. 

In this paper we focus on processes that are asymptotically stationary and er- 
godic, and on enhanced sampling approaches that use bias potentials (or equivalently 
conservative bias force fields) that attempt to modify the original system's dynamics 
so as to avoid rare events. Well-known examples of such approaches are Umbrella 
Sampling [26\ , conformational flooding f5^ , Metadynamics and variants jl2l [T] . These 
approaches assume that the modeler has some knowledge of coordinates or order pa- 
rameters that are "slow", i.e. that the rare event dynamics of the system is resolved 
by state transitions in these selected coordinates. 

Umbrella sampling defines a series of K biased simulations, each of which uses 
the forces from the original dynamics plus the forces due to one of K harmonic poten- 
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tials. These potentials restrain the biased simulations to stay close to positions x^; in 
the selected coordinates which are the centers of the umbrella potentials. The force 
constant (s) of these potentials must be chosen such that the corresponding biased 
stationary densities overlap significantly and the unification of all biased stationary 
densities covers the part of state space in which the original stationary density is 
significantly greater than zero. In this case, the K biased simulations, together with 
the knowledge of the umbrella potentials can be used in order to estimate the orig- 
inal stationary density in the selected coordinates (or the corresponding free energy 
landscape). 

Metadynamics is based on an opposite philosophy. Rather than by constraining 
the simulation to a set of points, it adds bias potentials to drive the simulation away 
from regions it has sampled sufficiently well. In practice this is often done by adding 
Gaussian hat functions to the bias potential every n simulation steps, centered at the 
current simulation state. We consider that this happens a number K times, leading 
to K simulation snippets, each with a different biasing potential. Due to limitations 
of filling high-dimensional space volumes, these bias potentials live usually also in a 
few pre-defined coordinates. Since the sequence of added bias potentials depends on 
the simulation history, metadynamics is usually used to first "fill up" the free energy 
wells until the states that cause the rare event waiting times have been destabilized 
and the corresponding free energy landscape is approximately "flat". It can be shown 
that at this point continuing the metadynamics simulation will sample bias potentials 
that are the negative free energy landscapes of the original system, up to an arbitrary 
additive constant. Since metadynamics does not require the modeler to know the 
relevant states along the slow coordinates, it is not only an approach to quantify 
the stationary distribution / free energy landscape of the original system but has 
been very successful in terms of exploring the state space in complex systems |16| . 
Unfortunately, this approach of using metadynamics also appears to suggest that all 
simulation effort that has been spent until the free energy surface is approximately 
flat cannot be used for quantitative estimations. 

Here we concentrate on the step of unbiased the modified dynamics so as to 
obtain the stationary distribution of the original dynamical system. For both, um- 
brella sampling and metadynamics, the step of "unbiasing" is usually done with the 
weighted histogram method (WHAM) fS^. WHAM uses a discretization of the se- 
lected coordinates in which the biased simulation was done, collects a set of K his- 
tograms, one for each of the biased simulations. These biased histograms are then 
combined to a single unbiased histogram by solving the self-consistent set of equations: 



counts in histogram bin i for simulation k, Mk is the total number of samples gener- 
ated by the fc-th simulation, z'^ is a normalization constant, is the unbiasing factor 
of state i at simulation k, and tTj is the unbiased probability of state i at simulation 
condition k. 

The assumption used by WHAM is that each of the K simulations done at dif- 
ferent conditions is sufficiently long such that they generate unbiased samples of the 
corresponding biased stationary density TTfc. In order words, each sub-simulation is as- 
sumed to be in global equilibrium at its conditions. We will see that restricting to this 
assumption is unnecessary, and a method that does not rely on this assumption can 
provide estimates that are substantially more precise (or, equivalently, require sub- 
stantially less simulation effort for a given level of precision) , even in trivial double- well 
examples. 




where is the number of 
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This paper develops the transition-matrix based unbiasing method (TMU) which 
replaces the assumption that the K biased simulations are in global equilibria by 
the much weaker assumption that each simulation is only in local equilibrium in the 
discrete states on which the stationary distribution is estimated. TMU has been 
motivated by the recent progresses in Markov modeling |25l HI |T5J [T^ , and constructs 
the joint unbiased stationary distribution from a series of K transition count matrices 
estimated from the K biased simulations. However, it is important to note that TMU 
does not need the discrete dynamics to be Markovian. 

Subsequently, we describe the basic mathematical assumptions underlying our 
method, then describe TMU in its most general form and show that the method has 
always a solution that is asymptotically normal and convergent. We then provide an 
approximate TMU that is efficient for very large state spaces and a large number K of 
sub-simulations. The method is demonstrated in conjunction with umbrella sampling 
and metadynamics on double-well potentials and its performance is compared with 
that of the standard WHAM method and a recently introduced method MMMM that 
had a similar motivation [21 . 

2. Background. In this section, we briefly review the mathematical background 
of biased simulation techniques. Let us consider a reference system in which the 
conflguration space can be decomposed into a discrete state set S with free energy 
V = [Vi] by using some sort of discretization (where Vi is the energy of the i-th state 
in S). If we denote the system state at time thy xt, the state sequence {xt \ is then 
a stochastic process. In this paper, we focus on processes {xt] with the following 
properties, which are relevant for many physical simulation processes: 

Asymptotic stationarity. It means that the sequence {xt\t > t} is approxi- 
mately stationary if r is large enough. More formally, {xt} is said to be asymp- 
totically stationary if there exists a family of distribution functions Fn : 5" i~?> K 
such that limT-_j.oo Pi'(.Tt-+i = si, . . . , Xr+n = s„) = F„(si, . . . , s„) for all n > I and 
si, . . . , Sn € S. Specifically, liniT^ao Pr {Is i^r) = i) = T^i with 



(2.1) 



exp {-pVj) 
E,exp(-/3y,) 



where tt = [tt^] denotes the system stationary distribution and /3 is a constant and 
generally proportional to the inverse temperature in physical systems, and we denote 
the limit lim^^^oo Pr [Is {xt+i) = j\Is (a^r) = «) by Tij. 
Wide-sense ergodicity. That is. 



1 * 



(2-2) plim — — 2^ lis(x^)=i = 



and 



1^- 



(2-3) plim - 2^ lIs(x^)=^ ■ lis{x^+,)=j = 7^ 



Detailed balance. The detailed balance condition can be written as 
(2.4) lim FT{Is{xt)^iJs{oot+i)^j) - 1™ FT{Is{xt) ^ jjs{xt^i) = i) 

t^oo t^oo 
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or equivalently TTiTij — "KjTji, for all which implies that each state transition has 
the same probability as its reverse. Note that this property clearly holds for systems 
that are time-reversible at equilibrium. 

Remark 2.1. For convenience, we measure energies Vi in units of thermal energy 
^BTthcimai with fee the Boltzmann constant and Tthcrmai the thermodynamic tem- 



perature, yielding /3 = 1 in (2.1 1. Furthermore, we assume without losing generality 



that all involved free energies in this paper are zero- mean. Then (2.11 has a unique 
solution 

(2.5) Vj = - log TT, + ^ ^ log TT, 

i 

for a given stationary distribution. 

Based on the asymptotic stationarity, we can define a matrix T = \Tij\ G mI'^I^I'^I 
to represent the stationary state transition probabilities. It is easy to see that T 
satisfies the condition that each row sums to 1, so here we call T the transition 
matrix of {xt\ for simplicity even if {xt\ is not a Markov chain. 

For now, our goal is to estimate V , or equivalently it. from simulations in the case 



that it is unknown. Due to the ergodicity (2.2 1, when sufficient simulation data can 
be generated, we can simply carry out one or multiple simulations of the reference 
system and get the estimate of V through computing the histogram of simulation data. 
This approach, however, is very inefficient when the reference system has multiple 
metastable states, because the simulation process is very likely to get stuck in some 
local minima of the energy landscape for a long time. To this end, biased simulation 
techniques, such as umbrella sampling [23] and metadynamics |T^, were developed to 
solve this problem, which perform simulations with a set of biased potentials so that 
the energy landscape can be explored more efficiently. 

Although many practical algorithms use a different approach, we can roughly 
summarize the estimation of stationary distributions through biased simulations in 
terms of the following pseudocode: 

Step 1 Design a set of biasing potentials \U^\k = 1, . . . where — S 
Ml5|xi, 

Step 2 Repeat Steps 2.1 and 2.2 for A: = 1, ... , K: 

Step 2.1 Reduce the state set to 5*^ C 5 and change the system potential 
as 

(2.6) = FiD^w + t/i'b^w - 4^ E (^^- + ^j) 

where V''- = [Vl"] G is called the biased potential, ID*^ (i) = 

Is (i-th element of iS*^), = {/^ (s) |s e 5*^}, and the last term is used 
to shift the mean of to zero. 
Step 2.2 Perform a biased simulation using the same simulation model as 
the reference system except that the state set and potential energy are 
changed from 5, V to 5*"', V*' . and record the simulation trajectory x^.j^^. 
Step 3 Estimate the reference (unbiased) free energy V from {xq.^^ |fc = 1, . . . 
In this paper, we will focus on the estimation problem in Step 3. We start with 
the assumption that each simulation a^g.jvffc ^ Markov chain, and the developed 
estimation method will then be proved to be applicable to more general simulation 
models. 



Remark 2.2. (2.6 I can be written in a more compact form by defining a potential 
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transformation matrix A'' = e mI'^I^I'^I with AJJ^- = IjeJ" ■ {hu''it)=j 

such that =A''{V + U^) . 

3. Maximum likelihood estimation from multiple simulations. 

3.1. Maximum likelihood estimation. In this section, we investigate a max- 
imum hkehhood approach to the estimation problem described in Section [2] Suppose 
that each simulation a^o-jv/t ^ time-homogeneous Markov chain. According to the 
third simulation property stated in Section |2j it is easy to see that a^o j\/fc ^ reversible 
Markov chain. (It can further be proved that x^.j^^ is irreducible and positive recur- 
rent under the condition of finite by using the first and second properties.) The 
maximum likelihood estimation (MLE) of the unbiased stationary distribution tt can 
then be obtained by solving the following optimization problem: 



(3.1) 



max 

Tr,Ti,...,T^ 
S.t. 



L = log Pr {xU,^ T'=) log ^ 

TT is a probability vector 
is a transition matrix 
diag (tt^) = T'^'^diag {-k^) , 



A: = l,...i^ 



where T'^ — [T*] and t:^ — [7r,f] denote the transition matrix and stationary dis- 
tribution of the fc-th simulation, = [Cj^] = the count matrix of 

the A:-th simulation and ACf ^ [^Cl^j\ = [l(/^, (x^Jj^. (.?))=(. j)] • (Here we 
set OlogO = and alogO — — oo if a > 0.) The last constraint is obviously the 
detailed balance constraint for the biased simulations, and can be equivalently writ- 
ten as diag (A'^tt) T'= = T^'^di&g (A'^tt) with A''' = [A^ = [liDfcc^)^^ ' exp {-U^)] S 
]^|5 |x|5|^ After performing the MLE of tt, the optimal estimate of V can also be 
obtained by using (2.5). 

Theorem 3.1. The optimization problem (3.1) has at least one optimal solution 
satisfying 



ij 

1T;'',>0 



0/or (j,j,fc)e {(i,j,fc)|C^. 



1. 

2. 

Proof. See Appendix [A] □ 



C. 



lc'=.>0; ^f^u > and lc'».>o 



— and i j} . 
lc'=.>o for all i,j,k. 



According to Theorem |3.1| we can reduce the dimension of the optimization vari- 
able of (3.1), and the reduced problem can be written as 



max 

""^{TijlCfj+C^.yO or 

(3.2) s.t. 



TT is a probability vector 

T'' is a transition matrix 

diag (A'^tt) T'^ = T'^Tdiag (A'^tt) , A: = 1, 



K 



with T;^ = for {i,j,k) S {{ij,k) \C^j + C^i = 0,i^ j}. It is clear that the problem 
size of (3.2) is much smaller than that of (3.1) especially when count matrices are 
sparse. However, even if each C'' is sparse with O {\S\) nonzero elements, the reduced 
problem (3.2 1 involves O {K \S\) decision variables and nonlinear equality constraints. 
(Note that tt and T*^ are both unknown in the last constraint.) It is still inefficient 
to search the optimal solution of (3.2) by direct methods. In Section |4] we will adopt 
an approximate MLE method to improve the efficiency based on the decomposition 
strategy. 



6 



OPTIMAL ESTIMATION OF MULTIPLE BIASED SIMULATIONS 



3.2 . C onvergence analysis. The MLE method of stationary distribution in 
Section ', 



3.1 



is motivated by the assumption that Xg.j^^ is a Markov chain. Interestingly, 
it turns out that the Markov property is not necessary for the convergence of MLE. 
In this section we will prove the convergence of MLE under more general conditions. 

First of all, we provide an intuitive explanation for why the MLE can work for non- 
Markovian stochastic processes. Generally speaking, if there is no other knowledge 



available, T can be estimated as T — 



with T!/!- being the fraction of observed 



transitions from the z-th state to the j-th state: 



(3.3) 



But the transition matrix estimates obtained from (3.3 1 generally do not satisfy the 



detailed balance condition and do not share the same unbiased stationary distribution 
for finite-time simulations. Therefore we search for the feasible transition matrices 
which are close to , . . . , : 



mm 



(3.4) 



s.t. 



TT is a probability vector 

T'^ is a transition matrix 

diag (A'^tt) T'' = r'^T^jjag ^^fc^^) ^ k ^ I, 



.K 



where Wk = M-f-jM is the weight of the simulation fc, = Mj. denotes the total 
length of performed simulations, Tif oc C^^ is an estimate of 7rf , and KL ^Tf ^ = 

log {TijlTij^ denotes the KL divergence [U] between the z-th rows of and 
. WfcTrf gives the fraction of occurrence of state i in simulation k. Note that 



J^z&fc^fKL {f^\\T^ 

i.k 



■ Wk^lft^ {log ft^-logTt^ 



(3.5) 



^q}(logT5)-^C^(logI^) 



and thus (3.4 1 is equivalent to (3.1). Therefore the MLE can be considered to be a 



projector which projects the counting estimates (3.3 1 onto the feasible space. Fig. 3.1 
shows the relationship between different estimates and the true values of (T^ , . ■ . , T^), 
where the estimates obtained by counting converge to their true values due to the 
wide-sense ergodicity of simulations. 

Remark 3.2. In [19' , a KL divergence rate was derived to measure the distance 
between Markov chains. Suppose {xj} and {a;"} are two Markov chains on the same 
state set with stationary distributions 7r',7r" and transition matrices T',T", then the 
KL divergence rate between them can be defined as 



KLR(7r',r'||7r",T") 



(3.6) 
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Figure 3.1: Relationship between the estimates obtained by counting, the maximum 
hkehhood estimates and the true values of transition matrices 



where tt^ denotes the i-th. element of tt' and T'^jT'/ denote the i-th rows of T',T". It 



is easy to see that the objective function of (3.4 1 is equivalent to the weighted sum of 
KLR (n'',f^\\T:'',T''^ with weight Wk ■ 



The equivalence of (3.11 and (3.4| implies that the consistency of the MLE can 



hold without assuming the Markov property if it can be shown that the error of the 
MLE converges to zero when T^, . . . , converge to their true values. Before proving 
the main theorems, we make some assumptions and introduce some notation. 

Unless stated otherwise, all convergence statements are made with respect to 
Af — >■ oo in this paper. 

, . . . , and V are finite. 



The limit w w exists and w >- 0, where w 



Assumption 3.3. 
Assumption 3.4 

W = [Wi]. 

Assumption 3.5. Xq.j^j ,...,Xq.]^,j are all asymptotically stationary and wide- 
sense ergodic processes with detailed balance. 

Assumption 3.6. For any i,j,k, if there exists some t such that Pr{Igk {x^) = 

i, Isk (xt^i) = j) > 0, then lim„^oo Pr{Isk (x^) = i, I^t (xn+i) j) > 0. 

Assumption 3.7. n — ?: is the unique solution of the following set of equations 
and inequalities: 



(3.7) 



diag (A'^tt) T'' = T^^^diag (A'^tt) , 
TTi = 1 and TT )^ 



for fc = 1, 



The above assumption means the unbiased stationary distribution can be uniquely 
determined if all the transition matrices are given. 

Furthermore, here we let 6 be the vector consisting of elements of the unbiased 
stationary distribution tt and transition matrices T^, . . . , X*' ~ — ["""f-^ij] 

Xfi = Tfff'' . Q be the feasible set defined 



with X'' = [Afj-] = [Tff T^-] and A*^ 



by constraints in (3.11, Vx 
(3.8) 



{w^V{X^f ,...,WKV{X^f) 



and 



Qie) = l^ wkX^^ log , Q{e) = Y, ^kX^j log Tt, 



Notice that functions Q (9) and Q {9) is linear in parameters Vxw and Vxi 
can construct a function $ (9) such that 



then we 



(3.9) 



'i0) = VL^iO), Q{9) = V^^^{9) 



8 



OPTIMAL ESTIMATION OF MULTIPLE BIASED SIMULATIONS 



Based on the above assumptions and notations, the maximum HkeHhood estimate 
can be expressed as 6* = argmaxg^e Q (6), and we have the following theorems on the 
convergence of 9. 

Theorem 3.8. If Assumptions 



3.3 



3.1 hold, then 9 A 



Proof. See Appendix [B|D 

Theorem 3.9. // Assumptions \3.3'^3. 7| hold and the following conditions are 
satisfied: 

1. For each k, there exists a such that 



(3.10) 



the 



(3.11) 



9 is obtained with variable reduction as in ^3.S^ . 
Diagonal elements of X^, . . . ,X^ are positive. 
All K simulations are statistically independent. 

sfM {w - w) 4 q. 

H = ^g^e^Q {p {(^r)) is nonsingular with 9r the vector consisting of {Tf!j\X^j 
0,i < j} and {tti, . . . ,7r|5.|_i}. 



> 



,We^^{9{9r)) andSx^ = diag(i()iSif,...,iDKSf). 



3.9 



where S = (V(,^$ {9 (6*^))) Ex 
Proof. See Appendix [C[ □ 

Remark 3.10. Since X*^ = C'^/M'', Condition [l] stated in Theorem 
that the central limit theorem holds for {V [AC^)} , and can be justified by using 
Markov chain central limit theorems [10 in many practical situations. For example, 
one sufficient condition for this is that for each simulation k, {a;^'} is obtained by 
coarse-graining a geometrically ergodic Markov chain {ut} with = f^ {yt)i which 
implies that 5'^ defines a partition of the state space of {vt} and each s € 5'^ is 
associated with a cluster (r) = s} in the state space. (See Appendix [p] for 

details.) 

Remark 3.11. In this section we only characterize the convergence of tt. For 
the MLE of free energy V, the consistency and asymptotic normality are immediate 
consequences of Theorems |3.8| and |3.9| by considering that is a smooth function of 
TT. Here we omit the detailed description and proof as they are trivial. 

4. Approximate maximum likelihood estimation. In this section, we de- 
velop an approximate MLE method based on a decomposition strategy in order to 
improve the efficiency of MLE, and the convergence of the method is shown. 

For convenience of analysis and computation, here we introduce two new variables, 



and 



the 



is a modified count matrix used to replace C'' 
approximate MLE, and is assumed to satisfy the following assumption: 

Assumption 4.1. C^,...,C^ are irreducible matrices with positive diagonal 



elements, and satisfy that 1 



Cf,>0 



and Ifjk^Qk — 1 for all i,j,k. 



One way to perform the count matrix modification is as follows: 



(4.1) 



max {Cij,S} , Cji > or i = j 
C*;- , otherwise 
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where S € (0, 1) is a small number. (This approach is similar to the so-called neighbor 
prior used in [T71 1^.) 



Theorem 4.2. If Assumptions 3.3 3.5 and 3.6 hold, and if X^^ > and 
^t=o^i j.(2;fc)=i > for all i,k, then the modified count matrices defined in 
satisfy Assumption \4. 1\ 

Proof. See Appendix |e1 □ 

The variable Z'' = \zf^ is defined by 

(4.2) exp {-Z%) cc X^^ 

which can be interpreted as the "free energy matrix" of state transitions in the k- 
th simulation because exp (^—Z^j) cx limt_j.co Pr (^5 (a^t) = h Is {^t+i) = j)- Like the 



free energies of states, we also assume that j)|x''.>o} ^ij ~ ^ such that (4.2 1 

has a unique solution with given X''. 

4.1. Decomposition and approximation of MLE pr oble m. Under the above 
assumption and variable definitions and replacing C'^' by C*' , (3.2 1 can be decomposed 
into K + 1 subproblems as follows: 

Local subproblems {k = 1 . . . , K). 



(4.3) 



L'^iV^)^ max - E.., e?,4- + C.^^f 

{^?,IGfj>o} 
s.t. = Z'^T 



k 



where Zf^- = oo if = 0, O-oo is set to be 0, = - log Ej exp (-Z^j) , = J2j 0-% 

and = i^Zi, . . . , Z^^^^j — J2i / I'^'^l denotes the the state potential obtained 

from Z*^. A brief description of the objective function and constraints is given below: 

1. The objective function is the log-likelihood of a Markov chain model with free 
energy matrix Z^ given C*^ , because log T!^- = Zf — Z*j . 

2. The first two constraints guarantee that the detailed balance condition holds 
for the fc-th simulation and the finite elements of Z'^ have zero mean, and can 
be eliminated by substituting 

4 ip") - 



(4.4) 



zf,, ^ c:-,>o,z>j 



in the objective function, where p'' denotes the vector consisting of {Z^^ jC^j- > 

3. The third constraint means the state potential obtained from Z*^ is equal 
to the given V'^. Note that both and V'' are zero-mean vectors, so this 
constraint can be equivalently written as [ I ] = [ ■"■ ^ ] 
Thus, L'' {V'') is equal to the maximum log-likelihood of a reversible Markov chain 
model under the condition that V'' is known, and the local subproblem has the fol- 
lowing equivalent formulation: 

i'= (V'') = max - E, , C^.Z^^ + E, C^Z^ 

S.t. [I ] v| = [ I ] 
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where and are both viewed as functions of . In comparison with (3.2 1, each 
local subproblem with a given only depends on one simulation and independent 
of the others. 

Global subproblem. 



(4.6) 



max Ef^i^M^M^- 
s.t. = Q 



The objective of this subproblem is to maximize the sum of maximum log-likelihoods 
for K biased simulations by selecting the best unbiased free energy. 

Obviously, if local subproblems can be efficiently and exactly calculated, the MLE 
of unbiased free energy V can be easily obtained from (4.6 1 because only \S\ vari- 



ables and one linear equality constraint are involved in the global subproblem. How- 
ever, local subproblems are nonlinear optimization problems with nonlinear equality 
constraints, of which solutions are generally computationally expensive and time- 
consuming. It is therefore necessary to find tractable approximate solutions of local 
subproblems. In the following, we address the approximations and solutions of the 
subproblems in detail. 

4.1.1. Local subproblems. Inspired by the fact that equality constrained quadratic 
programming problems can be solved analytically, here we present a tractable approx- 



imate solution of (4.6|) based on Taylor expansions of the objective function and the 
constraint equation, 
steps. 



The approximate solution method consists of the following three 



Search for the optimal solution of (4.5) without the constraint on the 



free energy. Omitting the constraint on the free energy, (4.5) can be written as 



(4.7) 



max 

pk 



and the gradient and Hessian matrix of the objective function used in optimization 
can be simply obtained by the following equations: 



(4.8) 
(4.9) 



dzl 



lj=/=i 



im ill 



1 



k ) 



Furthermore, we can verify that the Hessian matrix yC^ , 

negative-definite under Assumption 4.1 (see Appendix [f|, which means (4.5) is a con- 
vex optimization problem with a single global optimum that can be solve d effi ciently 
using any local optimization algorithm. Here we denote the solution of (4.7 1 by p^, 



and we will employ a conjugate gradient algorithm [27^ to find p^ in our numerical 
experiments. 



Construct a quadratic programming approximation of (4.5) by Taylor 



expansions. Replacing the objective function and the nonlinear constraint of (4.5 1 
by their second-order and first-order Taylor expansions around p^ , we can get the 
following approximate local subproblem: 



(4.10) 



S.t. 



yk 



max 

pk 



1 V 



yk 
p" ^ Z 



H 



'p{c',p' 



-P-) 
I 
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where = {p^) 



and L'' = L'' (V'' 



Calculate L^^ {V'') from (4.10). It is easy to verify that [I ] V pkV^ [p^] 



is full row rank. Then using the Lagrange multiplier method, we get 



(4.11) 
where 

(4.12) 



^ [p. ,p 



I 



[I o]v, 



h'^A&'.p'' 



([I ^ ]v ,.vl {p^)fY [10] 



is negative-semidefinite and satisfies [ I ]~^(c\p^) [I ]^ <0. 



4.1.2. Global subproblem. Substituting (4.11 1 into (4.6 1 leads to the approx- 
imate global subproblem: 



(4.13) 



max Y.til{A^{V + U'^)-V^)'^^(Q_\, 



s.t. 



By applying the KKT conditions, it is easy to verify that the solution of (4.131 is 



K 



(4.14) V 




Remark 4.3. We can now explain why variables , V'' are used instead of T'',tt'' 
in the problem decomposition and approximation. First, the approximate local sub- 
problem based on Taylor expansions w.r.t. T^,tt^ involves inequality constraints and 
therefore has no analytic solution. Secondly, even if we can get a quadratic poly- 
nomial approximation to the function L*' {V^ i'^'')) t>y some method (e.g., omitting 
the inequality constraints in approximate local subproblems as in [211), the resulting 



approximate global problem cannot be solved as easily as (4.131 because both V and 
TT are nonlinear functions of tt'^ . 

4.2. Approximate MLE algorithm. Summarizing the above discussions, the 
approximate MLE algorithm of the unbiased free energy is described a s fol lows: 
Step 1. For k = 1, . . . , K , calculate the modified count matrix C'' by (4.1 ). 
Step 2. For k ^ 1, . . . , K, search for the optimal point p'' of the programming prob- 



lem (4.71 



Step 3. For fc = 1, . . . calculate E'' (c_'',p''^ by (4.121 



Step 4. Calculate the approximate maximum likelihood estimate of V by (4.14) 



4.3. Convergence analysis. In this section, we will analyze consistency and 
asymptotic normality of the approximate MLE as the exact MLE under the assump- 
tions stated in Section |3.2| and Assumption |4.1[ Before introducing the main theo- 
rem, some definitions and a lemma are needed. Let p'^ and p'' be vectors consisting of 
{Z^m > 0,z < J, ^ {\S'^\ , IS^l)} and {Z^j\X^j > 0,^ < J, ^ [\S'^\ , l^'^])}. 



(4.15) 



i,3 



j2w,x:^znp' 
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(4.16) 



and X — 



- k 



Q' {ft) = - E 4 {ft) + E ^'^^z^ it) 

{f^) and (/) 



Cr/M". Like Q {9) and Q {9) in Section 



can also be written as 



3.2 



- k 



(4.17) Q'= (/) - WkV [X j $fc (/) , Qfc (/) ^ ,2),v (X'^) $fc (/) 

and p*^ = argmaXpfc (p*"') if dim (p'"') = dim (/O*^)- 

Lemma 4.4. Provided that Assumptions \3. 3p?7| and \4.1\ hold. 

1. 4 p^ and A V^. 

2. /Mfe (V''^ - 1/'=) 47V(0,S^) if (3.101 is satisfied, where 



(4.18) 



(4.19) 



yk 



Proof. See Appendix [GjO 

Theorem 4.5. Provided that Assumptions \3^3.T\ and \4.1\ hold. 



2. Vm {V -V) 4 A/'(0,I]y) if (3.101 is satisfied for all k and K simulations 
are statistically independent, where 



K 



(4.20) 



/ K 

fe=i 

x + 

fe=l / 



and Sy has the same definition as in Lemma 
Proof. See Appendix [H] □ 



4.4 



4.4. Error analysis. According to Theorem 



4.5 



the estimation error of V 



follows approximately a multivariate normal distribution A/" (0, Sy/Af ) when M is 
large enough, and can be estimated by (4.18), (4.19) and (4.20) with replacing 

- - k 

Wk, X'^ , p'^ with Wk,2L tP^ if given. Therefore, the remaining key problem 

is how to estimate E^. In this section we present an algorithm for estimating 
based on the following assumption, which is similar to the assumption proposed in 
Remark |3.10| and implies that each simulation is driven by a stationary, irreducible 
and reversible Markov model. 

Assumption 4.6. For each simulation k, can he expressed as a function 
of a latent variable with Xf = f*' {yt) o,nd {j/f } is a stationary, irreducible and 
reversible Markov chain. 
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Under Assumption 4.6 it can be seen that | V [AC^) } is also a stationary process. 



To describe the estimation algorithm, we also need some new notation. We denote by 

(4.21) k'^ (h) = Gov (V {AC'^) , V (AC.'V,)) 

the autocovariance of { V (^AC^) } with lag h. It is clear that the {{i — 1) \S'' | +j, (m — 
1) jiS'^ |+n)-th element of (h) is equal to the covariance between 1^^ ^ ) i k {x^))={i j) 
and 1(,^^ (.^^,_ J j^,(.^^^ Furthermore, let V'^ (l) = {21 1 1) {21 + 2), 
and rj'^ {I) be a sum of some elements of F'^ {I), which can be represented as 

(0 - ^Cov (l(/^,«J,/^,(.f))=(j,^)' ^{is^{-i+,,)Js.{-^+2,+i)h(^,j)) 

(4.22) +Cov (l(/,,(.tJ,/,,(.^))=o-..), !(/,. «,,+,),/,. «,,+,))=(.j)) 



Theorem 4.7. If Assumption 4-6 and (3.10) holds, and the series J2'hLo^'' (^) 
is convergent, then 



1. s^ = «'=(o) + E;^o(r'=(0 + rMO^ 

2. (0 > ||r''' (Oil ^ and rj'' {I) < r^^ [l + 1) for I > 0. 
Proof See Appendix]!] □ 



The above theorem provides an intuitive way to estimate 



(4.23) = k'' (0) + ^ (f '= (0 + f (0^ 

l=Q 

where k'' (0) and T'' (/) denote the estimates of k*^ (0) and F'"' (1). We now investigate 
the calculation of k'^ (0) and {I). 

Estimation of k*^ (0). It is easy to verify that the ((«— 1) l^''" |+j, (m— 1) |5*''|+n)- 
th element of k'^ (0) equals l(jj)^(„ — X^^Xi^^, therefore we can calculate the 

the element in the same position in k'^ (0) by l(i n)^ij — ^ij^mn with X^j = 

Estimation of F*"' (l). For h > {), {h) can be estimated by the empirical 
autocovariance: 



(4.24) [h) = —Y, (V {AC^) - V {X^)) (V (ACf+,) - V {X^))' 



t=i 



where X'^ is an estimate of E [AQ*^] = X'^. Then the F'^ (/) can be estimated as 
(4.25) f (0 = {21 + 1) + k"= {21 + 2) 



However, the estimation error ( 4.25[ ) will increase substantially as / approaches to 



[{Mk — 3) /2J. So here we modify the F''^' {I) by correcting the corresponding esti- 
mated value of Ty'^ {I): 

( f'^(0, 1 = 

(4.26) f'=(0 = <^ min{Ci|^,l}.f"=(0, Z > 1 and ry"= (/) > 

[ 0, Z > 1 and fi"' {I) < 
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where f]"^ {I) and f]^ {I) denote the values of ■q'^ (/) obtahied from f (/) and f ''^ (l). 
It can be seen that [l) is non -neg ative and decreasing with /, which is consistent 



with the conclusion of Theorem 



4.7 



Besides, we can show that r*-' (/) = for / > 
if there exists an Z„ such that fj'^ [In) < 0. Thus the estimator of in this section is 
in fact a time window estimator [7j where the large-lag terms outside the window are 
set to be zero, and the window size = min {/l?)'^ [1) < O} implies that the curve of 
llr'^ (/)|| goes below the noise level at I — ly,. 



1^Yj\1 = 0, but the obtained by (4.231 may not satisfy the constraints. For 



Remark 4.8. From the definition of Sy we can deduce that > and 



this problem, we can correct the value of as := Ml (Mp (t 

X ) ) I where 

M p (G) = G— min {A„iin (G) , 0} I can map a symmetric matrix to a positive-semidefinite 
matrix with Amin (G) denoting the smallest eigenvalue of G, and M\ (G) = (l — ^ll"^) • 
G • (l — ^ll"^) for G € M"^" is a mapping from the positive-semidefinite matrix set 
to the set {SIE > 0, = O} . 

5. Numerical experiments. In this section, the approximate MLE proposed 
in this paper will be applied to some numerical examples of multiple biased simula- 
tions, and the performance will be compared to that of WHAM and MMMM. For 
convenience, here we denote a set of multiple biased simulations described in Section 
[2]by MBS {K, Mq) if there are K biased simulations and each simulation has the same 
length with Mk = Mq. 

5.1. Umbrella sampling with Markovian simulations. Umbrella sampling 
is a commonly used biased simulation technique, where each biasing potential (also 
called "umbrella potential") is designed to confine the system around some region 
of state space and achieve a more efficient sampling especially at transition states 
which the unbiased simulation would visit only rarely. In this example, the um- 
brella sampling simulations are employed on a reference system with state set S = 
{s, = -5 + 10{i- 1) /99\i 1, . . . , 100} and free energy V = [V^,] = [0.25s* - 5sf - 



9.9874]. As shown in Fig. 3.1 the reference system has two metastable states cen- 
tered at A and B, and the switching between metastable states is blocked by an energy 
barrier with peak position O. 

For umbrella sampling simulations, we design the following 15 different biased 
potentials: 



(5.1) = [Ut] 



15 60 
14^ 



1< fc < 15 



Note that these potentials will be repeatedly used if the simulation number is larger 
than 15, i.e., U'' = [/((^-i) mod i5)-i-i jf > 15^ xhe simulation trajectory x^.^^^ is 
generated by a Metropolis simulation model (see Appendix [j] for details), which is a 
reversible Markov chain with initial distribution 

(5.2) Pr (4^ = s,) cx exp (-t/f ) 
and stationary distribution 

(5.3) Trf (X exp {~V^ ~ t/f ) 

The comparisons between the estimation methods are based on the mean error 
of approximations of energy barrier heights: 



(5.4) e^y = ^ {\AVab - AVZ^n + I^Vba - Al/~i 



2 
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Figure 5.1: Free energy profile of the reference system, which has two potential wells 
with minima at A and B separated by an energy barrier. The highest energy position 
O of the barrier represents the transition state, and the energy barrier heights for 
transitions A — >■ i? and B ^ A are defined as ISVab = \Vo — Va\ and AVba = 
\Vo — Vb\ with Va, Vb and Vq the potentials of A, B and O. 



where the definitions of AVab and AVba are given in Fig. |3.1| and the superscript 
"approx" represents the approximate value obtained from the estimated V . 

We first set K ^ 15 and Mq = 500, 910, 1657, 3017, 5493, 10000, and perform 30 



independent MBS {K, Mq) for each value of Mq. Fig.|5.2a displays the average e^v of 



the approximate MLE, MMMM and WHAM for different Mq, and Figs. [5?2c| and [5^ 
show the estimates of V obtained from a run of MBS (15, 500) and MBS (15, 10000). It 
can be seen that the estimation errors of all the three methods decrease with increasing 
simulation length, and the proposed approximate MLE performs significantly better 
than the other two methods. Note that MMMM is also a Markov chain model based 
method, but its performance turns out to be worse than WHAM in this numerical 
experiment, especially for large simulation lengths Mq. 

Next, we validate whether the estimation methods can reconstruct the free en- 
ergy V from very short simulations. Here we fix the total simulation time MqK, and 
set Mq = [22500/ JC] with K = 45,90,135,180,225,270. The estimation results are 
summarized in Figs. 5.2b[ 5.2e and 5.2f (The la confidence intervals in Fig. 5.2f are 



provided by using the sample standard deviation of V calculated from the 30 inde- 
pendent runs of MBS (270, 83) because the simulation length is too short such that 



the error analysis approach in Section 4.4 is not applicable.) It should be noticed the 



equilibrium assumption used by WHAM does not hold if Mq is too small, because 
the initial distributions of simulations differ from the biased stationary distributions 
(see Appendix [j| . Therefore the estimation accuracy of WHAM is reduced when the 
individual simulation lengths are shorter although the total data size stays almost the 
same. In contrast, the proposed approximate MLE and MMMM are less affected by 
the change in the length of individual simulations. This is because these methods rely 
on having local rather then global equilibrium assumptions. Furthermore, the pro- 
posed method outperforms both WHAM and MMMM in this numerical experiment. 

5.2. Umbrella sampling with non-Markovian simulations. We now con- 
sider the estimation problem from an umbrella sampling simulation in the case that 
the Markov assumption does not hold, i.e., the bins used to estimate the free energy 
do not correspond to the Markov states of the underlying simulation. The simulation 
model and the other settings in this section is basically the same as in Section |5.1| 
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-Approximate MLE 
-ilMMM 

WHAM 



Approximate MLE 

---MMMM 

WHAM 



(a) Average bav calculated over 30 indepen- 
dent runs of MBS {K, Mo) for K = 15 and 
Mo = 500, 910, 1657, 3017, 5493, 10000. 



K 

(b) Average e/^v calculated over 30 
independent runs of MBS {K, Mo) 
for K = 45,90,135,180,225,270 and 
Afo = [22500//^]. 



— True value 

- - Approximate MLE 
©-MMMM 

• -WHAM 

la iutervals 




(c) Estimates of V generated by the different 
estimators on a run of MBS (15, 500) where 
the e^v of approximate MLE = 3.5015, e/^y 
of MMMM = 6.3153 and eav of WHAM = 
6.3500. 



Ti-ue value 
Approximate MLE 
MMMM 
WHAM 
1(T intervals 




(d) Estimates of V generated by the different 
estimators on a run of MBS (15, 10000) where 
the EAy of approximate MLE = 0.3060, e^y 
of MMMM = 0.6002 and cav of WHAM = 
0.3397. 



— True value 

- - Approximate MLE 
©-MMMM 

• -WHAM 

1(7 intervals 




(e) Estimates of V generated by the different 
estimators on a run of MBS (45, 500) where 
the eAv of approximate MLE = 0.4570, e^y 
of MMMM = 1.5012 and e^v of WHAM = 
1.8948. 



— True value 
- - Approximate MLE 
©-MMMM 
• -WHAM 

la intervals 




(f) Estimates of V generated by the different 
estimators on a run of MBS (270, 83) where 
the EAV of approximate MLE = 3.1939, e^y 
of MMMM = 4.2559 and cav of WHAM = 
7.2764. 



Figure 5.2: Estimation results of umbrella sampling with Markovian simulations. The 
1(7 conf iden ce intervals in (c), (d) and (e) are obtained by the approach described in 
Section 4.4 and those in (f ) are obtained from the sample standard deviation of V in 



the 30 independent runs. 
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except that the state set is defined as S — {si, . . . , sio} with si = {si, . . . , sio}, S2 = 
{sii, . . . , S15}, S3 = {si6, . . . , S20} • ■ ■ Si7 = {ss6, ■ ■ • , sgo} and sig = {sgi, . . . , sioo}- 
It is clear that the observed state sequences in simulations do not satisfy the Markov 
property with this definition of states. 

We utilize the three methods to approximate the free energy V by using the 
non-Markovian simulation data, and the estimation results with different {K, Mq) are 



shown in Fig. 5.3 where e^v is defined in the same way as in Section [5.1 [ with A,B 
and O the local minimum and peak positions in S. As observed from the figures, the 
estimates obtained from the approximate MLE are more precise than those obtained 
from the other estimators for various values of {K, Mq). 

5.3. Metadynamics with Markovian simulations. Metadynamics is another 
biased simulation technique often employed in computational physics and chemistry, 
which is able to escape local free energy minima and improve the searching properties 
of simulations through iteratively modifying the biasing potential. Given K, Mq and 
a reference system as in Section [5. 1| a metadynamics procedure can also be expressed 
as a run of MBS {K, Mq) with 

''^-{ur+u^s.l.^), Ill ' ^ork^l,...,K 

where Uc {s\x) denote a Gaussian function of s centered at x. Thus, for each of the K 
simulations in a MBS run, a Gaussian hat is added to the potential at the last point 
of the previous simulation. This effectively fills up the potential energy basins with 
increasing k. Ultimately the effective potential becomes approximately flat. Here we 
define Uc {s\x) = 5exp ^— (s — x)^^ , and the simulation data Xq.j^j^ is also generated 

by the Metropolis sampling model with Xq — x'lj\ 

The three estimation methods are applied to reconstruct the free energy of the 
reference system by data generated by metadynamics with different {K, Mq), and the 
estimation results are shown in Fig. |5.4[ The superior performance of the presented 
method is clearly evident from the figures. 

5.4. Metadynamics with non-Markovian simulations. In this example, the 
free energy estimation problem of metadynamics with non-Markovian simulations is 
investigated. We generate the simulation data as in Section [5T3| and convert the state 
sequences to non-Markov processes as in Section |5.2[ Then the three methods can be 
used to estimate the unbiased free energy of states si, . . . , sis- 

All the estimation results are displayed in Fig. |5.5[ It is obvious that the approx- 
imate MLE does a much better job in the free energy estimation than the other two 
methods. 

6. Conclusions. We have presented a transition-matrix-based estimation method 
for stationary distributions or free energy profiles using data from biased simulations, 
such as umbrella sampling or metadynamics. In contrast to existing estimators such 
as the weighted histogram method (WHAM), the present estimators is not based on 
absolute counts in histogram bins, but rather based on the transition counts between 
an arbitrary state space discretization. This discretization may be in a single or a 
few order parameters, e.g. those order parameters in which the umbrella sampling 
or metadynamics simulations are driven, or they may come from the clustering of 
a higher-dimensional space, such as frequently use in Markov modeling. The only 
condition is that the energy bias used in the biased simulations can be associated 
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-Approximate MLE 
-MMMM 

■WHAM 



7 
6 

5 

< , 
V 4 



Approximate MLE 

---MMMM 

WHAM 



§00 910 1657 3017 5493 10000 

Mo 

(a) e/^v calculated over 30 independent runs 
of MBS(X, A/o) for K = 15 and Mo = 
500, 910, 1657, 3017, 5493, 10000. 



45 90 135 180 225 270 

K 

(b) Average e/^v calculated over 30 
independent runs of MBS(if, A/q) 
for K = 45,90,135,180,225,270 and 
Afo = [22500/ K]. 



— True value 

- - Approximate MLE 
©-MMMM 
• -WHAM 

la iutervals 




(c) Estimates of V generated by the different 
estimators on a run of MBS (15, 500) where 
the e^v of approximate MLE = 2.4208, e/^y 
of MMMM = 6.2008 and eav of WHAM = 
6.1413. 



— True value 
- - Approximate MLE 
Q-MMMM 
• -WHAM 

la iutervals 




(d) Estimates of V generated by the different 
estimators on a run of MBS (15, 10000) where 
the e/^Y of approximate MLE = 0.4532, e^y 
of MMMM = 0.6289 and cav of WHAM = 
0.4711. 




'^^0 5 10 15 20 

state 



(e) Estimates of V generated by the different 
estimators on a run of MBS (45, 500) where 
the B/^v of approximate MLE = 0.4989, e^y 
of MMMM = 1.6107 and e^v of WHAM = 
0.6257. 



True value 

Approximate MLE 

-e-MMMM 




^^0 5 10 15 20 

state 

(f) Estimates of V generated by the different 
estimators on a run of MBS (270, 83) where 
the e^v of approximate MLE = 2.5862, e^y 
of MMMM = 3.6059 and e^v of WHAM = 
4.7896. 



Figure 5.3: Estimation results of umbrella sampling with non-Markovian simulations. 
The 1(7 co nfide nce intervals in (c), (d) and (e) are obtained by the approach described 
in Section 4.4 and those in (f ) are obtained from the sample standard deviation of V 



in the 30 independent runs. 
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-Approximate MLE 
-ilMMM 

WHAM 



Approximate MLE 

---MMMM 

WHAM 



(a) Average bav calculated over 30 indepen- 
dent runs of MBS (it, Mo) for K = 15 and 
Mo = 500, 910, 1657, 3017, 5493, 10000. 



120 
A' 



(b) Average eav calculated over 30 inde- 
pendent runs of MBS (X, Mo) for K = 
40, 80, 120, 160, 200 and Mo = [2000/K]. 



50- 
40- 
30- 
20- 
10 


-10- 
-20- 



True value 

Approximate MLE 

-S-MMMM 
-*-WHAM 

la iutervals 




50 r 
40- 
30- 
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True value 

Approximate MLE 

-Q-MMMM 
-•-WHAM 

la intervals 




(c) Estimates of V generated by the different 
estimators on a run of MBS (15, 500) where 
the e^Y of approximate MLE = 6.2858, e/^y 
of MMMM = 11.9022 and eav of WHAM 
= 12.4188. 



-5 5 

state 

(d) Estimates of V generated by the different 
estimators on a run of MBS (15, 10000) where 
the e^v of approximate MLE = 0.8246, e^v 
of MMMM = 3.1642 and sav of WHAM = 
5.7601. 



True value 

Approximate MLE 

-e-MMMM 
-•-WHAM 

la intervals 



,*u:'*ioL"* 



-5 5 

state 

(e) Estimates of V generated by the differ- 
ent estimators on a run of MBS (40, 50) where 
the bav of approximate MLE = 1.6589, cav 
of MMMM = 3.6707 and eAv of WHAM 
= 7.5342. 




— True value 
- - Approximate MLE 
0-MMMM 
• -WHAM 

la intervals 




(f) Estimates of V generated by the different 
estimators on a run of MBS (200, 10) where 
the EAV of approximate MLE = 2.8518, bav 
of MMMM = 6.5913 and cav of WHAM = 
10.9540. 



Figure 5.4: Estimation results of metadynamics with Markovian simulations. The la 
con fidence intervals in (c) and (d) are obtained by the approach described in Section 
4.4 and those in (e) and (f ) are obtained from the sample standard deviation of V in 



the 30 independent runs. 
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Approximate MLE 

---MMMM 

WHAM 



§00 910 1657 3017 5493 10000 

(a) Average bav calculated over 30 indepen- 
dent runs of MBS (it, Mo) for K = 15 and 
Mo = 500, 910, 1657, 3017, 5493, 10000. 



40 80 120 160 200 

A' 

(b) Average eav calculated over 30 inde- 
pendent runs of MBS (X, Mo) for K = 
40, 80, 120, 160, 200 and Mo = [2000/K]. 



True value 

Approximate MLE 




(c) Estimates of V generated by the different 
estimators on a run of MBS (15, 500) where 
the EAV of approximate MLE = 2.6950, eav 
of MMMM = 8.2297 and e^v of WHAM = 
9.7315. 



True value 

Approximate MLE 




5 10 15 20 

state 

(d) Estimates of V generated by the different 
estimators on a run of MBS (15, 10000) where 
the EAy of approximate MLE = 0.2325, eav 
of MMMM = 2.5563 and eav of WHAM = 
3.7233. 



True value 

Ajiproximate MLE 

-e-MMMM 
-»-WHAM 

la intervals 




5 10 15 20 

.state 

(e) Estimates of V generated by the differ- 
ent estimators on a run of MBS (40, 50) where 
the EAV of approximate MLE = 2.4020, eav 
of MMMM = 4.0281 and eav of WHAM 
= 6.3157. 



30- • 



True value 

Approximate MLE 

-Q-MMMM 
-•-WHAM 

la intervals 




5 10 15 20 

state 

(f) Estimates of V generated by the different 
estimators on a run of MBS (200, 10) where 
the EAV of approximate MLE = 2.5255, eav 
of MMMM = 8.0936 and eav of WHAM = 
8.6291. 



Figure 5.5: Estimation results of metadynamics with Markovian simulations. The la 
con fidence intervals in (c) and (d) are obtained by the approach described in Section 
and those in (e) and (f ) are obtained from the sample standard deviation of V in 



4.4 



the 30 independent runs. 
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to the discrete states, suggesting that at least the order parameters used to drive 
the umbrella sampling/metadynamics simulation should be discretized finely. The 
stationary probabilities or free energies are then reconstructed on the discrete states 
used. The estimator presented here has a number of advantages over existing meth- 
ods such as WHAM. Most importantly, in all scenarios tested here, the estimation 
error of the transition-matrix-based estimator was significantly smaller than that of 
existing estimation methods. The reason for this is that the estimator does not rely 
the biased simulation to fully equilibrate within one simulation condition, but only 
asks for local equilibrium in the discrete states - which is a much weaker requirement. 
As a consequence, the present method can also be used to estimate free energy pro- 
files and stationary distributions from metadynamics simulations using all simulation 
data. Previously, metadynamics simulations could only be analyzed using the fraction 
of the simulation generated after the free energy minima have been filled and the sim- 
ulation samples from an approximately fiat free energy landscape. These advantages 
may lead to very substantial savings of CPU time for a given system, and on the other 
hand, permit the simulation of systems that were otherwise out of reach. 



Appendix A. Proof of Theorem |3.1[ 

For convenience, here we define 9 to be the solution set defined by constraints 
in (3.11, 01 be the set of feasible solutions which satisfies Tfij = for {i,j,k) G 
k) \C^j + — and i ^ j^, 62 be the set of feasible solutions which satisfies 
1t.*>o ~ lc'=.>o f'-"' hi E^iid the condition 

(A.l) uj : C*"- > and l^fc >o = lc'=.>o ^'^^ hjjk 



Part (1). In this part, we will prove the optimal solution existence of (3.1 ). Sup- 
pose that (7r',T'\ . . . ,T"^^ is a feasible solution with objective value L' > —00. We 

can define a new objective function L+ (tt, , . . . , T^) = max{i (tt, , . . . , T^) , L'- 
a} where a > is a constant. It is easy to verify that L_|_ is a continuous function 
on Q. Thus, the optimization problem max(^ '^jfjge L_|_ (tt, T^, . . . , T^) has a 

global optimal solution (n", T"\ . . . , T"^^ with L+ (n", T"\ . . . , T'"^) ^ L" for 6 

is a closed set. Noting that L" > L' > L' - a, we have L (7r",T"\ . . . ,T"'^'^ = L". 
Therefore, for any (tt, T^, . . . , T^) e 9, L (tt, T^, . . . , T^) < L+ (tt, r\ . . . , T^^) = 
L(n'\r'\...,r'^ 



Part (2). In this part, we will prove the first conclusion of the theorem. Sup- 
pose that ( tt', T'^, . . . , T'^ ) is an optimal solution. We can define a new solution 



TT\r'\....r'^] with 



(A.2) T"^, = J H+c^.>o-^'^J^ 

Obviously, [t:\ T"\ . . . , T"^^ is a feasible solution belonging to 9i, and > T 
We have 

L (^', T"\. . . , T"^) = L (^', T'\ . . . , T'^) + (l°g^"' - log^'* 



(A.3) >lU',T'^,...,T' 



i.k 
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Therefore, (n',T"^,...,T"^] is also an optimal solution. 



Part (3). We now prove the second conclusion. Suppose there is an optimal 
solution ^7r',r'^. . . . ,T'^^^ belonging to 8i\02. Then there exist i,j,k such that 

T'*j = and > 0, and L (^n' , T'\ . . . , T'^^ = -oo. This leads to a contradiction 

with the optimality of ^7r',r'^, . . . ,T'^^. Thus, the optimal solution belonging Qi 
must be an element of Q2 if ^ holds. 

Appendix B. Pr oof of Theorem |3.8[ 

From Assumption EgI we have X'^^ = if A*; 0. Then Q{9) ,Q (0) > -00 

and we can define the following new functions, Q+ (9) — max |q (9) ,Q (9) — a| and 

Q+ (9) = max {Q {9) ,Q (9) — a}, where a > is a constant. 

Part (1). First of all, we will prove that 9 is the unique solution of maxgge Q {9)- 
Noting that n'^ = J^i ^u- have 



(B.l) 



fe i \ I / j 



According to the property of the KL divergence, Q {9} can achieve the maxi mal v alue 

that 



if and only if Tj'j = T^* for all i, j, k. Then we can conclude from Assumption 
9 = 9 is the unique solution of maxgge Q (9). 
Part (2). It is easy to verify that 



3.7 



(B.2) 



9 — argmaxQ (9) <^ 9 = argmax(5+ (9) 

eee eee 



and 9 — argmax^gQ (3+ (9). The proof is omitted because it is trivial. 

Part (3). In this part, we will prove that sup^gQ Q+ {9) — Q+ (9) 
event 



0. Define 



(B.3) 
and set 

(B.4) 



w : WkX^j > e for all (i, j, k) G Si and Q{9)>Q{9)-e 
'Q{9)-e- ' 



61 - <^ 9\Tl^ > exp 



for ii,j,k) e Si}r\e 



where Sj = {{i,j,k) \X^j > O} and e £ (0,mm(^ij^k)eSi 'WkX^^). We now analyze the 



value of 



{0) - Q+ (0) 



when Lo holds. 



Case (i) 9 e e\ei. There is a {i, j, k) such that [i, j, k) e Si and T^^ < exp ' '^^"'^ " ° 



We have 
(B.5) 
and 
(B.6) 



Q {9) < log 7^- < (^) - e - a < {O) 



'{9) < Wk^logTt^ <Q{9)~e-a<Q{9)-a 



HAO WU AND FRANK NOE 



23 



Then 9 e Qi and 



{t.j,k)esi 



(B.7) 



< 



e — a 



E 



log?; 



Case (ii) 9 G Qi. Investigating all possible orders among values of {Q (9) ,Q (9) — 
a, Q{9),Q (9) - a} (i.e., Q (9) < Q (9) ~ a < Q (9) < Q (9) - a, Q {9) < 
Q [9^ ~ a < Q [9) — a < Q [9) and so on) , we have 

Q+ (9) - Q+ {9)\ < max {|0 (0) - g , |q (^) - (^)|} 



max{|log7;^.|,|logi;^.|} 



(B.8) 



< 



e — a 



E 



WkX^j - WkX^j 



Combining the above results, we get 



(B.9) 1^ • sup Q+ (9) - Q+ (9) 



< 



Q{o) 



E 



WkX^j - WkX^j 



Moreover, considering that Wk Wk, X^j A X^j and 



(B.IO) Q{9)-Q{9) 



< 



i9) 



E 

(i.j,k)eSj 



WkX^j - WkX^j 



we have 1^ ^ 1. Therefore 



(B.ll) 



sup 

eee 



Q+ (9) - Q+ (9) 



According to the definitions of (9) and (5+ (^) ^^^d conclusions of Parts (1)- 
(3), it can be easily verified that (5+ (9) satisfies the following conditions: (i) {9) 
is uniquely maximized at 9, (ii) 8 is compact; (iii) (5+ {9) is continuous; (iv) (5+ {9) 
converges uniformly in probability to Q+ {9). Then we have 9 9 hy using Theorem 
2.1 in [14 and (|R2]). 



Appendix C. Proof of Theorem [3T9| 

For any k, 



(C.l) 



^MwkV^ - VMwkV^ = V^k^Mk (V^ - + {wk - Wk) 



Then VM (Vxw - Vxn.) 4aA(0,Ex^). 
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Let 8r be the feasible set of Or defined by constraints in (3.1), where Tjj and 
7r|5| which do not b elon g to 6^ can be treated as functions of bj. since the MLE is 
performed by using (3.2 1. It is easy to see that 9r is an interior point of O,,, and 

(C.2) 01 = {61^17^ (Or) > e for k) £ Si and vr, (6*^) > e for all i} n 9^ 

is a closed neighborhood of dr, where e G (O, min {niin(jj j,)g5^ T* , min^ Tf^ } ) and Sj 
has the same definition as in Appendix [B| 
It is easy to verify that 



MV, 



• AA(0,S) 



and supe^ge^ 
VM {Or - Or 



Ve^e^Q {6 (Or)) - Ve,eM {^r)) A 0. Then by Theorem 3.1 in jH], 



Appendix D. Proof of Remark |3.10[ 

In this appendix we will show that (3.10) holds if 
geometrically ergodic Markov chain. 



and {ut} is a 



Let TZ and (•) denote the state space and stationary distribution of {yt}- Then 
for any yi, t > 1 and set A C TZ x TZ, we have 

Pr ((yf , 2/ti) e Aly^y'^) - / /i (dy^) JC (y^ {y^,)) 



(D.l) 

with 
(D.2) 

where Aq 



(y^d2/f)g(yf)- / ^ (dyf ) (yf ) 



9 {y^) = 



0, 



y^ i ^0 



{wlthere exists v such that (u, w) g A}, yli (u) = {v\ (u, w) £ A}, K {u, B) 
Pr (yf''+;^ e -B|y('' = m) and /C" = Pr (yt\„ € Blyl" = u) denote the transition 

kernel and multi-step transition kernel of {yt^ ■ It is clear that \g (y^)| < 1 for any 
y^. Considering that {y^} is geometrically ergodic, we can deduce that there exist a 
function a (•) and a constant 77 G (0, 1) such that 

Pr((yf,y,V)GA|y^y^)- / /i (dyfO (yf , (y,'^)) 



<2dTv (/C*-i(y^.),M(-)) 



(D.3) 



< 2a (y^) 11'-' 



Because (D.3 1 holds for any A £1 TZ x TZ, the stochastic process |y't | with y'\ = 

{jJt^Vt+i) is also a geometrically ergodic Markov chain with stationary distribution 
n' (dy^, dy^j^-^ — fi (dy^) K. (y^, dy^^^) . Hence, combining Theorem 1.2 in [9 and the 

Cramer- Wold device [50], we can conclude that ^/Mk converges 
in distribution to a multivariate normal distribution with zero mean. 

Appendix E. Proof of Theorem \4.2\ 
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Part (1). It is clear that the modified count matrix Q in (4.11 satisfies that 



1 



Ic*" >o ^'^^ diagonal elements are all positive. Here we only prove the 

irreducibility of by contradiction. Assume without loss of generality that there 
exists n £ [l, |iS'^|) such that 



(E.l) 



0, for i < n and j > n 



Then C^j = Cj^ = for i < n and j > n, which implies the states si and S2 can not 
both appear in the fc-th simulation if I^k (si) < n and I^k (52) > n. This contradicts 



the condition that X^tHo fc(a:J)=i 

Part (2). We now show that X'' is irreducible in the same way as in the first 
part. Assume without loss of generality that there exists n € [l, |iS'^|) such that 



> for all i. So C is irreducible. 



(E.2) 



X^j — 0, for i < n and j > n 



Then X^j 



Xj ^ — for i < n and j > n since X'^ is symmetric. According to 
Assumption 3.6 for any si,S2 G S'' which satisfy Igk (si) < n and I^k (32) > n, 



the probability of that si and S2 both appear in the fc-th simulation is zero. This 
contradicts the ergodicity of the simulation. So X'' is irreducible. 
Part (3). In this part we will prove that l(jk^fjk A 1. Define 

(E.3) (jj : is an irreducible matrix and satisfy C^^ > and Ipfc. >o ~ ^c^ >o 

and 



3.5 



3.6 



we have 1 



^ ^x'' >o- Therefore 1,,, A 1 



According to Assumptions 

since X'' is a symmetric and irreducible matrix with positive diagonal elements. On 
the other hand, C'^ — C'' if uj holds. Hence lr>fc- 



4i. 



Appendix F. Proof of < 0. 



Define pj; to be the vector consisting of ^Z^^lQij > o| and Hp^ = J2i VpfeptZf. 
It is clear that pjj can be expressed as a linear function of p'' with = A'^p''. Then 



Hp can be written as 



According to (|4.9|, for any p''l = [z%\C^j > o}, 



im in 



I 



Noting that for any z, the term 



E 



i;*:- ( z' 



yje{i|cf^.>o} 




U ^3 



E 

je{j|c^-.>o} 



E 

Vje{j|c[.>o} 



V 
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is equal to the variance of a discrete random variable with distribution Pr \^ai = Z' ^ 
Tt, and support {z%\C!y^ > o} (7* = exp {~Z^) / exp (-Z^) > if > 0) 



We can conclude that H^^ < 0, and 



Hp^P't = if and only if 



(F.2) Z'ln = Z'l for any m, n G > o} , 1 < * < [^'^j 

Further, if there exists a p''^ such that p'^ = ^aP''^ ' P'a should satisfy 



(F.3) 



Z'f - = for any and 



E 



(^.]}e{(tMCl^>a} 



According to (F.2), (F.3 1 and Assumption 4.1 it is easy to see p'''^ A'f H'^^A^^'" = 
if and only if ^= 0, which implies = A'^^H^^A'^ < 0. 



Appendix G. Pro of of L em ma 

Under Assumptions 3.3||3.7| and |4.1 



4A[ 



it is easy to see that X'' is irreducible and its 
diagonal elements are positive (see Appendix |e]) , which implies that T*"' is an ergodic 
sition matrix. Thus V'' — {p^)- 
Define event 



(G.l) 

and 

(G.2) 



lc''.>o = Ix*" >o for all i,j 
dim (^f)^^ = dim (p^) 



p' 
0, 



dim ip) 7^ dim ip 



It is clear that the functions l^t •Q'^ {p^) and (p*^) satisfy: (i) p^ = argmaxpk l^k ■ 
(p'^) and Q'' (p'^) is uniquely maximized at p^, (ii) l^t • Q*^ (p*^) is concave for 
VpkpkQ'' (/) = Hf; < 0, (iii) l^fc •Q'' (p'') 4 Q*^ (/) for any p*^ because 



1. Then we have p''' A- p'' A- p'^ according to Theorem 2.7 in [14J, and 



yfc — > T/^'^ since (p'") is a continuous function of p'^. 

We now show the second conclusion of the lemma. Because p*^ = arg maXpt Q'' (p''') , 
Vp Q'' (p*^) = O"^. Then we have 



Mk[Vo{l^.-Q' (p^)) 



u'^o • (V,^ *'^- • (V (1") - V (X^) 



(G.3) 



4 (0, (Vp^$fe (p-'^))^ (Vp^$^- (p-'=) 



Furthermore, the Hessian matrices of l;^fc •Q'^ (p*^) and (p'') satisfy Vpfept (l^^fc • {p'^)) 
'^pkpkQ'^ (p*^) for any p'^ . Using the mean value theorem and Theorem 3.1 in [T4j, we 

can conclude that a/M^ (p*" - p^) A- Af (O, Ej;) and 

7^4 (V^'^ - I/'-) A v^Vp. V| (ap'-^ + (1 - a) p") {p" - p") 
(G.4) 4aA(0,I]^) 
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where a G [0, 1] is a function of p*' . 

Appendix H. Proof of Theorem |4.5 



Since the vahie of V will not be affected if we replace with w'^X^ = /M, 
V can be expressed as 

K 



fe=i / 

/ K 

(H.l) • (^^'1', P') {V"" - A'^U'^) 



\fe=i 

K 



- k 

and y = argmaxiTy^o Qq [V; {V'^^) with 

K 



^ -k 

,p 



k=l 



(H.2) • {A'' {V + U^) - V^) 

Part (1). We first prove that V is the unique maximum point of Qq{V\ {t^''}) un- 
der the constraint l^V — 0. It is clear that V" is a maximum point since Qq{V] {t^*^}) = 
and Qq{V \{V^}) < 0. We now show the uniqueness of V by contradiction. Sup- 
pose that V' is another maximum point which satisfies Qq{V'; {V'^}) = and V' ^ V. 

Then [I ]{A'' {V + U^) - t^*^) = for any k because [ I ] {w^x' ,p^^ ■ 

[I < 0, which impHes that the first j^''! - 1 elements of A'' [V + [/'=) - 
are zero. Further, considering that 1^ A^ {V + U^) = l^V^ = 0, we can conclude 
that A^ [V + U^) — fo r eac h k. Therefore probability distribution tt' = [tt'^J with 



Tr'i (X exp(— y'i) satisfies (B.5), which contradicts Assumption 3.7 Thus V is the 
unique solution of argmaxiTy^Q Qq{V] {V^}). 

Part (2). In this part we will prove that null (^Ef=i A^^'E!' {w^X^ ,p^^ A'^^ = 
span (1) for any ,X'^ ,p^. It is clear that span (1) C null (^X^^Li A^^'El^ {w'^X^ ,p^^ A 

since A^l = 0. Suppose that v ^ span (1) is another vector which belongs to l^v ~ 0, 
then V-\-v 7^ is also a maximum point of argmaxj^Ty^Q Qq {V; {V^^^ . This is a con- 
tradiction to the result of Part (1). Therefore null {^^=1 ^^^'^^ {w^x" ,p^^ A''^ = 
span (1). 

Part (3). According to the result of Part (2) and Theorem 5.2 in [24J, we have 
(H.3) 




Substituting (H.l I and (H.Sl into V — V and \/M (V — V) leads to the conclusions 
of the theorem. 

Appendix I. Proof of Theorem |4.7[ 

Let G (•) be a function of with 

(I.l) G{y^) = [G,, (y^)] = [l,^^ (^.(^.))^, Pr (l^. (x^,) ^ j\y^) 
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and K% {h) = Gov (V (G (y^'^)) , V (G (yf+z,))) be the h lag autocovariance of {V (G {lit))]- 
It is easy to verify that [h) and k*^ (/i + 1) composed of the same elements but in 
different arrangements for h>Q, and ■q'^ [1) = tr [kq (21) + Kq {21 + 1)). 

From the above results, we can conclude that 77'= (/) = tr {k^ {21) + k% {21 + 1)) 
is a non-negative and decreasing function of I for / > 0, and Kq (20 + Kg (2^ + 1) > 
by using Theorem 3.1 in |7|. Therefore the second conclusion of the theorem can be 
proved since tr (k^ (2^ + {21 + 1)) > \\k% {21) + n% {21 + 1)11^^^^^. 

We now show the first conclusion. According to Theorem 2.1 in and considering 
that X^ftLo '^G (^) convergent, we have 



(1.2) lim A^Var , 



On the other hand, 



Af-l \ 00 

-^V(G(y,^-)) =4(0) + 2^4W 

t=0 / h=l 



K% (h) 



-^V(G(yf))U4(0)+2^ 

t=0 / h=l 

Combining the above two equations leads to 

N-l , N-1 _ 

(1.4) Jim E ^4 {h) - Ji^m E V^'^ ('^^ - 

h=l h=l 



Therefore 




'k'' {h)+K^ (/l)^) 

(1.5) ^ k'' (0) + E {'^'^ (M + i^"" {h)^] 



h=l 

as N -^00, and = (0) + YlZo {^'' (0 + (0^^ 
Appendix J. Metropolis simulation model . 

The Metropolis simulation model generates the xl^^.j^j^ by the following steps if the 
reference V and biasing potential U'' are given: 
Step 1. Let i := and sample Xq ^ Pr (x§ = Si) oc exp {—Uf). 
Step 2. Sample a;' = Sj from the proposal distribution Pr {x' — Sj\xi — Si) — qij (x 

li»-ii<9 and calculate the acceptance ratio a = min | ''^^{ — -tt-^, 1 \ with 
I I oxp(-y,-;7*)<jj. ' J 



X^ — Si- 

Step 3. Set xji^^ to be x' with probability a and x^ with probability 1 — a. 
Step 4. Terminate the simulation if t = Mk- Otherwise, let i := t + 1 and go to Step 
2. 
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